Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporter/containerimage: new option: rewrite-timestamp (Apply SOURCE_DATE_EPOCH to file timestamps) #4057

Merged
merged 2 commits into from
Sep 14, 2023

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Jul 24, 2023

rewrite-timestamp rewrites timestamps in layers for reproducible builds.

When a layer is rewritten, an annotation buildkit/rewritten-timestamp=<INT64> is set to the layer.

Example:

buildctl build
  --frontend dockerfile.v0 \
  --opt build-arg:SOURCE_DATE_EPOCH=${SOURCE_DATE_EPOCH} \
  --output type=oci,dest=dest.tar,rewrite-timestamp=true \
  ...

Alternative to:


With this commit, the following workarounds mentioned in docs/build-repro.md are no longer needed for reproducible builds:

# Limit the timestamp upper bound to SOURCE_DATE_EPOCH.
# Workaround for https://github.com/moby/buildkit/issues/3180
ARG SOURCE_DATE_EPOCH
RUN find $( ls / | grep -E -v "^(dev|mnt|proc|sys)$" ) -newermt "@${SOURCE_DATE_EPOCH}" -writable -xdev | xargs touch --date="@${SOURCE_DATE_EPOCH}" --no-dereference
# Squash the entire stage for resetting the whiteout timestamps.
# Workaround for https://github.com/moby/buildkit/issues/3168
FROM scratch
COPY --from=0 / /
Backup

39c3a49 : Abandoned version, contaminates cache

@AkihiroSuda AkihiroSuda changed the title exporter/containerimage: new option: rewrite-epoch exporter/containerimage: new option: rewrite-epoch (Apply SOURCE_DATE_EPOCH to file timestamps) Jul 24, 2023
@AkihiroSuda AkihiroSuda added this to the v0.13 milestone Jul 24, 2023
@AkihiroSuda AkihiroSuda force-pushed the rewrite-epoch branch 2 times, most recently from 15e643d to f54de34 Compare July 25, 2023 13:42
@AkihiroSuda
Copy link
Member Author

@tonistiigi
Copy link
Member

@AkihiroSuda I think until we figure out how this will work with cache-to/from we need to make sure that if rewrite-epoch is set then these blobs will never be 1) exported with cache, also meaning inline-cache would probably need to be incompatible 2) reused in any subsequent build that doesn't set rewrite-epoch to the same value (this is contrary to the compression behavior atm where existing compression is reused).

@AkihiroSuda AkihiroSuda marked this pull request as draft August 4, 2023 14:39
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Aug 7, 2023

@AkihiroSuda I think until we figure out how this will work with cache-to/from we need to make sure that if rewrite-epoch is set then these blobs will never be 1) exported with cache, also meaning inline-cache would probably need to be incompatible 2) reused in any subsequent build that doesn't set rewrite-epoch to the same value (this is contrary to the compression behavior atm where existing compression is reused).

I guess the entire logic should be just decoupled from cache/, and the layer blobs should be just patched in exporter/containerimage/writer.go:commitDistributionManifest() ?

(So, the epoch will be applied to the base layers as well)

@AkihiroSuda AkihiroSuda marked this pull request as ready for review August 7, 2023 17:19
@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Aug 7, 2023

Updated as above.

The rewrite-epoch option no longer touches the cache.
The blobs are converted right before pushing, in exporter/containerimage/writer.go:commitDistributionManifest().

@AkihiroSuda AkihiroSuda force-pushed the rewrite-epoch branch 3 times, most recently from 46f3787 to 4963e73 Compare August 7, 2023 17:39
@AkihiroSuda AkihiroSuda requested a review from tonistiigi August 10, 2023 00:30
@felixdesouza
Copy link

I'm really excited for this to arrive! This will save us a lot of time spent doing the docker save -> strip timestamps -> docker load` pipeline. I had a go earlier today on a toy example, and for the most part it worked great. I did run into one issue however:

If I have this Dockerfile:

FROM alpine:latest
ARG SOURCE_DATE_EPOCH

RUN echo felix > asdf.txt

The last layer is not a valid tar file. Namely because it's small enough to be missing padding. In addition does not have the tar end marker, although that could be an implementation detail of tar (I am on macOS). However, without the rewrite, this works as normal and produces a tar file that's readable by macOS tar, so in a way it is a bit of a regression.

Below is the fq output of the stripped tar that's too small:
Screenshot 2023-08-25 at 15 30 11

Below is the fq output of the same layer but without any epoch rewriting:
Screenshot 2023-08-25 at 15 30 25

I haven't attached, but in the non epoch rewritten tar, there's also the end_marker padding that's 1024 bytes long. This obviously doesn't exist in the epoch rewritten tar given that an individual file within the tar is missing the padding block.

Not entirely sure whether this is the right place for this issue, but just wanted to flag.

@AkihiroSuda
Copy link
Member Author

@felixdesouza Thanks, fixed in the latest commit. (the tar writer was not flushed)

@felixdesouza
Copy link

Ah, beat me to it! I think there's one last flush missing, at the end of the archive, we'll get an io.error.EOF, which is the point in which the archive should be closed to write the padding + footer: https://pkg.go.dev/archive/tar#Writer.Close. But I think we'll also miss that without an error check.

@AkihiroSuda AkihiroSuda force-pushed the rewrite-epoch branch 2 times, most recently from fdde4db to 1ed7a11 Compare August 25, 2023 19:03
@AkihiroSuda
Copy link
Member Author

Should be fixed in the latest commit

@wiktor-k
Copy link

Great to see this being worked on, thanks!

I wonder one thing: would it be beneficial to also add test cases that'd check if the digest does not change? Having reproducible images is super great but I'm worried some other contributor will break it by accident in the future.

@wiktor-k
Copy link

Already checked (frontend/dockerfile/dockerfile_test.go)

Excellent, thanks for the confirmation 👍

@AkihiroSuda
Copy link
Member Author

@tonistiigi PTAL 🙏

#4057 (comment)

@@ -422,7 +423,7 @@ func ensureCompression(ctx context.Context, ref *immutableRef, comp compression.
}

// Resolve converters
layerConvertFunc, err := getConverter(ctx, ref.cm.ContentStore, desc, comp)
layerConvertFunc, err := converter.New(ctx, ref.cm.ContentStore, desc, comp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be follow-up: This New pattern doesn't work with function anymore and should probably return a struct/interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New implies a new instance of a struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New implies a new instance of a struct.

I'm not sure.
I thought it can return any.

util/converter/tarconverter/tarconverter.go Outdated Show resolved Hide resolved
util/converter/tarconverter/tarconverter.go Outdated Show resolved Hide resolved
util/converter/converter.go Show resolved Hide resolved
// e.unpackImage cannot be used because src ref does not point to the rewritten image
err = errors.New("exporter option \"rewrite-epoch\" conflicts with \"unpack\"")
} else {
err = e.unpackImage(ctx, img, src, session.NewGroup(sessionID))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this function so that it takes Result[Remote] as parameter. This can be follow-up but I do think we should implement it and not error. Same for push.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure how it relates to this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are new conditions now for unpack and push, instead of reusing the code-path. And unpack case errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, but let me defer that to a follow-up 🙏

exporter/containerimage/writer.go Outdated Show resolved Hide resolved
@AkihiroSuda AkihiroSuda marked this pull request as draft September 13, 2023 08:48
@AkihiroSuda AkihiroSuda marked this pull request as ready for review September 13, 2023 09:04
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More follow-ups: when the layer conversion happens, we should show a progress line about it to the user.

exporter/containerimage/writer.go Outdated Show resolved Hide resolved
exporter/containerimage/writer.go Show resolved Hide resolved
@AkihiroSuda
Copy link
Member Author

More follow-ups: when the layer conversion happens, we should show a progress line about it to the user.

done

@AkihiroSuda
Copy link
Member Author

BTW, on the second thought, rewrite-epoch may sound weird as English?

rewrite-timestamp might be less confusing?
Or rewrite-layer-timestamp?

@AkihiroSuda
Copy link
Member Author

Changed to rewrite-timestamp

(Backup: commit 0265bb7 (rewrite-epoch))

@AkihiroSuda AkihiroSuda changed the title exporter/containerimage: new option: rewrite-epoch (Apply SOURCE_DATE_EPOCH to file timestamps) exporter/containerimage: new option: rewrite-timestamp (Apply SOURCE_DATE_EPOCH to file timestamps) Sep 14, 2023
@AkihiroSuda AkihiroSuda force-pushed the rewrite-epoch branch 3 times, most recently from 6a8e2e3 to 1b59dfc Compare September 14, 2023 03:01
rewrite-timestamp rewrites timestamps in layers for reproducible
builds.

When a layer is rewritten, an annotation "buildkit/rewritten-timestamp=<INT64>"
is set to the layer.

Example:
```
buildctl build
  --frontend dockerfile.v0 \
  --opt build-arg:SOURCE_DATE_EPOCH=${SOURCE_DATE_EPOCH} \
  --output type=oci,dest=dest.tar,rewrite-timestamp=true \
  ...
```

Alternative to PR 3560

Signed-off-by: Akihiro Suda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants